<!DOCTYPE html>
<html lang="en">
<head>
	<meta charset="UTF-8">
	<meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1">
	<title>归一化词元 | Elasticsearch: 权威指南 | Elastic</title>
    <!-- Give IE8 a fighting chance -->
    <!--[if lt IE 9]>
    <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
    <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script>
    <![endif]-->
	<link rel="stylesheet" type="text/css" href="../static/styles.css" />
</head>
<body>
<div class="main-container">
    <section id="content">
        
        <div class="content-wrapper">
            <section id="guide" lang="zh_cn">
                <div class="container">
                    <div class="row">
                        <div class="col-xs-12 col-sm-8 col-md-8 guide-section">
                            <div style="color:gray; word-break: break-all; font-size:12px;">原文地址: <a href="https://www.elastic.co/guide/cn/elasticsearch/guide/current/token-normalization.html" rel="nofollow">https://www.elastic.co/guide/cn/elasticsearch/guide/current/token-normalization.html</a>, 版权归 www.elastic.co 所有<br/>
                            英文版地址: <a href="https://www.elastic.co/guide/en/elasticsearch/guide/current/token-normalization.html" rel="nofollow">https://www.elastic.co/guide/en/elasticsearch/guide/current/token-normalization.html</a>
                            </div>
                        <!-- start body -->
                  <div class="page_header">
<b>请注意:</b><br>本书基于 Elasticsearch 2.x 版本，有些内容可能已经过时。
</div>
<div id="content">
<div class="breadcrumbs">
<span class="breadcrumb-link"><a href="index.html">Elasticsearch: 权威指南</a></span>
»
<span class="breadcrumb-link"><a href="languages.html">处理人类语言</a></span>
»
<span class="breadcrumb-node">归一化词元</span>
</div>
<div class="navheader">
<span class="prev">
<a href="char-filters.html">« 整理输入文本</a>
</span>
<span class="next">
<a href="lowercase-token-filter.html">举个例子 »</a>
</span>
</div>
<div class="chapter">
<div class="titlepage"><div><div>
<h2 class="title">
<a id="token-normalization"></a>归一化词元<a class="edit_me edit_me_private" rel="nofollow" title="Editing on GitHub is available to Elastic" href="https://github.com/elasticsearch-cn/elasticsearch-definitive-guide/edit/cn/220_Token_normalization/00_Intro.asciidoc">edit</a>
</h2>
</div></div></div>
<p>把文本切割成词元(token)只是这项工作的一半。为了让这些词元(token)更容易搜索, 这些词元(token)需要被 <em>归一化</em>(<em>normalization</em>)--这个过程会去除同一个词元(token)的无意义差别，例如大写和小写的差别。可能我们还需要去掉有意义的差别, 让 <code class="literal">esta</code>、<code class="literal">ésta</code> 和 <code class="literal">está</code> 都能用同一个词元(token)来搜索。你会用 <code class="literal">déjà vu</code> 来搜索，还是 <code class="literal">deja vu</code>?</p>
<p>这些都是语汇单元过滤器的工作。语汇单元过滤器接收来自分词器(tokenizer)的词元(token)流。还可以一起使用多个语汇单元过滤器，每一个都有自己特定的处理工作。每一个语汇单元过滤器都可以处理来自另一个语汇单元过滤器输出的单词流。</p>






</div>
<div class="navfooter">
<span class="prev">
<a href="char-filters.html">« 整理输入文本</a>
</span>
<span class="next">
<a href="lowercase-token-filter.html">举个例子 »</a>
</span>
</div>
</div>

                  <!-- end body -->
                        </div>
                        <div class="col-xs-12 col-sm-4 col-md-4" id="right_col">
                        
                        </div>
                    </div>
                </div>
            </section>
        </div>
    </section>
</div>
<script src="../static/cn.js"></script>
</body>
</html>